AITopics | harry potter

Collaborating Authors

harry potter

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Memorization with Compression

Neural Information Processing SystemsFeb-15-2026, 12:42:50 GMT

large language model, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country:

Europe (0.14)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Indiana > Marion County > Indianapolis (0.04)
North America > United States > California (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.68)

Industry:

Law (1.00)
Government > Regional Government (0.67)
Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.97)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.68)
(3 more...)

Add feedback

FURINA: A Fully Customizable Role-Playing Benchmark via Scalable Multi-Agent Collaboration Pipeline

Wu, Haotian, Jiang, Shufan, Chen, Mingyu, Feng, Yiyang, Lin, Hehai, Zou, Heqing, Shu, Yao, Qin, Chengwei

arXiv.org Artificial IntelligenceOct-14-2025

As large language models (LLMs) advance in role-playing (RP) tasks, existing benchmarks quickly become obsolete due to their narrow scope, outdated interaction paradigms, and limited adaptability across diverse application scenarios. To address this gap, we introduce FURINA-Builder, a novel multi-agent collaboration pipeline that automatically constructs fully customizable RP benchmarks at any scale. It enables evaluation of arbitrary characters across diverse scenarios and prompt formats, as the first benchmark builder in RP area for adaptable assessment. FURINA-Builder simulates dialogues between a test character and other characters drawn from a well-constructed character-scene pool, while an LLM judge selects fine-grained evaluation dimensions and adjusts the test character's responses into final test utterances. Using this pipeline, we build FURINA-Bench, a new comprehensive role-playing benchmark featuring both established and synthesized test characters, each assessed with dimension-specific evaluation criteria. Human evaluation and preliminary separability analysis justify our pipeline and benchmark design. We conduct extensive evaluations of cutting-edge LLMs and find that o3 and DeepSeek-R1 achieve the best performance on English and Chinese RP tasks, respectively. Across all models, established characters consistently outperform synthesized ones, with reasoning capabilities further amplifying this disparity. Interestingly, we observe that model scale does not monotonically reduce hallucinations. More critically, for reasoning LLMs, we uncover a novel trade-off: reasoning improves RP performance but simultaneously increases RP hallucinations. This trade-off extends to a broader Pareto frontier between RP performance and reliability for all LLMs. These findings demonstrate the effectiveness of FURINA-Builder and the challenge posed by FURINA-Bench.

dimension, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2510.068

Country:

North America > United States (0.92)
Asia > Middle East > UAE (0.27)

Genre: Research Report > New Finding (0.47)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Leisure & Entertainment (0.94)
Education (0.92)
Media (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Memorization with Compression

Neural Information Processing SystemsOct-11-2025, 00:24:10 GMT

arxiv preprint arxiv, memorization, training data, (13 more...)

Neural Information Processing Systems

Country:

Europe (0.14)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Indiana > Marion County > Indianapolis (0.04)
North America > United States > California (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.68)

Industry:

Law (1.00)
Government > Regional Government (0.67)
Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.97)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.68)
(3 more...)

Add feedback

Concept Unlearning in Large Language Models via Self-Constructed Knowledge Triplets

Yamashita, Tomoya, Yamanaka, Yuuki, Yamada, Masanori, Miura, Takayuki, Shibahara, Toshiki, Iwata, Tomoharu

arXiv.org Artificial IntelligenceSep-22-2025

Existing MU methods aim to remove specific target sentences from an LLM while minimizing damage to unrelated knowledge. However, these approaches require explicit target sentences and do not support removing broader concepts, such as persons or events. To address this limitation, we introduce Concept Unlearning (CU) as a new requirement for LLM unlearning. We leverage knowledge graphs to represent the LLM's internal knowledge and define CU as removing the forgetting target nodes and associated edges. This graph-based formulation enables a more intuitive unlearning and facilitates the design of more effective methods. We propose a novel method that prompts the LLM to generate knowledge triplets and explanatory sentences about the forgetting target and applies the unlearning process to these representations. Our approach enables more precise and comprehensive concept removal by aligning the unlearning process with the LLM's internal knowledge representations. Experiments on real-world and synthetic datasets demonstrate that our method effectively achieves concept-level unlearning while preserving unrelated knowledge.

artificial intelligence, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2509.15621

Country:

Asia (1.00)
North America (0.67)
Europe > Finland (0.47)

Genre: Research Report > New Finding (0.93)

Industry:

Information Technology > Security & Privacy (1.00)
Media > Film (0.68)
Leisure & Entertainment (0.68)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

OBLIVIATE: Robust and Practical Machine Unlearning for Large Language Models

Xu, Xiaoyu, Du, Minxin, Ye, Qingqing, Hu, Haibo

arXiv.org Artificial IntelligenceSep-10-2025

Large language models (LLMs) trained over extensive corpora risk memorizing sensitive, copyrighted, or toxic content. To address this, we propose \textbf{OBLIVIATE}, a robust unlearning framework that removes targeted data while preserving model utility. The framework follows a structured process: extracting target tokens, building retain sets, and fine-tuning with a tailored loss function comprising three components -- masking, distillation, and world fact. Using low-rank adapters (LoRA) ensures efficiency without compromising unlearning quality. We conduct experiments on multiple datasets, including Harry Potter series, WMDP, and TOFU, using a comprehensive suite of metrics: \emph{forget quality} (via a new document-level memorization score), \emph{model utility}, and \emph{fluency}. Results demonstrate its effectiveness in resisting membership inference attacks, minimizing the impact on retained data, and maintaining robustness across diverse scenarios.

artificial intelligence, large language model, natural language, (19 more...)

arXiv.org Artificial Intelligence

2505.04416

Country: Asia (0.28)

Genre: Research Report > New Finding (0.66)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

PersonaEval: Are LLM Evaluators Human Enough to Judge Role-Play?

Zhou, Lingfeng, Zhang, Jialing, Gao, Jin, Jiang, Mohan, Wang, Dequan

arXiv.org Artificial IntelligenceAug-15-2025

Current role-play studies often rely on unvalidated LLM-as-a-judge paradigms, which may fail to reflect how humans perceive role fidelity. A key prerequisite for human-aligned evaluation is role identification, the ability to recognize who is speaking based on dialogue context. We argue that any meaningful judgment of role-playing quality (how well a character is played) fundamentally depends on first correctly attributing words and actions to the correct persona (who is speaking). We present PersonaEval, the first benchmark designed to test whether LLM evaluators can reliably identify human roles. PersonaEval uses human-authored dialogues from novels, scripts, and video transcripts, challenging models to determine the correct persona according to the conversation context. Our experiments, including a human study, show that even the best-performing LLMs reach only around 69% accuracy, well below the level needed for reliable evaluation. In contrast, human participants perform near ceiling with 90.8% accuracy, highlighting that current LLM evaluators are still not human enough to effectively judge role-play scenarios. To better understand this gap, we examine training-time adaptation and test-time compute, suggesting that reliable evaluation requires more than task-specific tuning, but depends on strong, human-like reasoning abilities in LLM evaluators. We release our benchmark at https://github.com/maple-zhou/PersonaEval.

arxiv preprint arxiv, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2508.10014

Country: Asia > China (0.28)

Genre: Research Report > New Finding (0.67)

Industry: Education (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Red Teaming for Generative AI, Report on a Copyright-Focused Exercise Completed in an Academic Medical Center

Wen, James, Nalawade, Sahil, Liang, Zhiwei, Bielick, Catherine, Boston, Marisa Ferrara, Chowdhury, Alexander, Collin, Adele, De Angelis, Luigi, Ellen, Jacob, Frase, Heather, Gameiro, Rodrigo R., Gutierrez, Juan Manuel, Kadam, Pooja, Keceli, Murat, Krishnamurthy, Srikanth, Kwok, Anne, Lu, Yanan Lance, Mattie, Heather, McCoy, Liam G., Miller, Katherine, Morgan, Allison C., Moerig, Marlene Louisa, Nguyen, Trang, Owen-Post, Alexander, Ruiz, Alex D., Puchala, Sreekar Reddy, Samineni, Soujanya, Tohyama, Takeshi, Ullanat, Varun, Valenza, Carmine, Velez, Camilo, Wang, Pengcheng, Wuest, Anna, Zhou, Yuxiang, Zhu, Yingde, Johnson, Jason M., Lenane, Naomi, Willcox, Jennifer, Vitiello, Francis J., Celi, Leo Anthony G., Umeton, Renato

arXiv.org Artificial IntelligenceJul-4-2025

Background: Generative artificial intelligence (AI) deployment in academic medical settings raises copyright compliance concerns. Dana-Farber Cancer Institute implemented GPT4DFCI, an internal generative AI tool utilizing OpenAI models, that is approved for enterprise use in research and operations. Given (1) the exceptionally broad adoption of the tool in our organization, (2) our research mission, and (3) the shared responsibility model required to benefit from Customer Copyright Commitment in Azure OpenAI Service products, we deemed rigorous copyright compliance testing necessary. Case Description: We conducted a structured red teaming exercise in Nov. 2024, with 42 participants from academic, industry, and government institutions. Four teams attempted to extract copyrighted content from GPT4DFCI across four domains: literary works, news articles, scientific publications, and access-restricted clinical notes. Teams successfully extracted verbatim book dedications and near-exact passages through various strategies. News article extraction failed despite jailbreak attempts. Scientific article reproduction yielded only high-level summaries. Clinical note testing revealed appropriate privacy safeguards. Discussion: The successful extraction of literary content indicates potential copyrighted material presence in training data, necessitating inference-time filtering. Differential success rates across content types suggest varying protective mechanisms. The event led to implementation of a copyright-specific meta-prompt in GPT4DFCI; this mitigation has been in production since Jan. 2025. Conclusion: Systematic red teaming revealed specific vulnerabilities in generative AI copyright compliance, leading to concrete mitigation strategies. Academic medical institutions deploying generative AI should implement continuous testing protocols to ensure legal and ethical compliance.

gpt4dfci, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2506.22523

Country:

North America > Canada (0.68)
North America > United States > Massachusetts (0.47)
North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Health Care Providers & Services (1.00)
Health & Medicine > Health Care Technology > Medical Record (0.69)
Health & Medicine > Therapeutic Area > Oncology (0.49)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)

Add feedback

Step-by-Step Reasoning Attack: Revealing 'Erased' Knowledge in Large Language Models

Sinha, Yash, Baser, Manit, Mandal, Murari, Divakaran, Dinil Mon, Kankanhalli, Mohan

arXiv.org Artificial IntelligenceJun-24-2025

Knowledge erasure in large language models (LLMs) is important for ensuring compliance with data and AI regulations, safeguarding user privacy, mitigating bias, and misinformation. Existing unlearning methods aim to make the process of knowledge erasure more efficient and effective by removing specific knowledge while preserving overall model performance, especially for retained information. However, it has been observed that the unlearning techniques tend to suppress and leave the knowledge beneath the surface, thus making it retrievable with the right prompts. In this work, we demonstrate that \textit{step-by-step reasoning} can serve as a backdoor to recover this hidden information. We introduce a step-by-step reasoning-based black-box attack, Sleek, that systematically exposes unlearning failures. We employ a structured attack framework with three core components: (1) an adversarial prompt generation strategy leveraging step-by-step reasoning built from LLM-generated queries, (2) an attack mechanism that successfully recalls erased content, and exposes unfair suppression of knowledge intended for retention and (3) a categorization of prompts as direct, indirect, and implied, to identify which query types most effectively exploit unlearning weaknesses. Through extensive evaluations on four state-of-the-art unlearning techniques and two widely used LLMs, we show that existing approaches fail to ensure reliable knowledge removal. Of the generated adversarial prompts, 62.5% successfully retrieved forgotten Harry Potter facts from WHP-unlearned Llama, while 50% exposed unfair suppression of retained knowledge. Our work highlights the persistent risks of information leakage, emphasizing the need for more robust unlearning strategies for erasure.

hogwart, large language model, natural language, (14 more...)

arXiv.org Artificial Intelligence

2506.17279

Country: Asia (0.14)

Genre: Research Report (1.00)

Industry:

Law (1.00)
Education (0.96)
Information Technology > Security & Privacy (0.93)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Harry Potter is Still Here! Probing Knowledge Leakage in Targeted Unlearned Large Language Models via Automated Adversarial Prompting

To, Bang Trinh Tran, Le, Thai

arXiv.org Artificial IntelligenceMay-26-2025

This work presents LURK (Latent UnleaRned Knowledge), a novel framework that probes for hidden retained knowledge in unlearned LLMs through adversarial suffix prompting. LURK automatically generates adversarial prompt suffixes designed to elicit residual knowledge about the Harry Potter domain, a commonly used benchmark for unlearning. Our experiments reveal that even models deemed successfully unlearned can leak idiosyncratic information under targeted adversarial conditions, highlighting critical limitations of current unlearning evaluation standards. By uncovering latent knowledge through indirect probing, LURK offers a more rigorous and diagnostic tool for assessing the robustness of unlearning algorithms. All code will be publicly available.

harry potter, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2505.1716

Genre: Research Report (0.64)

Industry:

Information Technology > Security & Privacy (0.93)
Law (0.93)
Media > Film (0.73)
Leisure & Entertainment (0.73)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

Add feedback

Lean and Mean: Decoupled Value Policy Optimization with Global Value Guidance

Huang, Chenghua, Wang, Lu, Yang, Fangkai, Zhao, Pu, Li, Zhixu, Lin, Qingwei, Zhang, Dongmei, Rajmohan, Saravan, Zhang, Qi

arXiv.org Artificial IntelligenceFeb-24-2025

Proximal Policy Optimization (PPO)-based Reinforcement Learning from Human Feedback (RLHF) is essential for aligning large language models (LLMs) with human preferences. It requires joint training of an actor and critic with a pretrained, fixed reward model for guidance. This approach increases computational complexity and instability due to actor-critic interdependence. Additionally, PPO lacks access to true environment rewards in LLM tasks, limiting its adaptability. Under such conditions, pretraining a value model or a reward model becomes equivalent, as both provide fixed supervisory signals without new ground-truth feedback. To address these issues, we propose \textbf{Decoupled Value Policy Optimization (DVPO)}, a lean framework that replaces traditional reward modeling with a pretrained \emph{global value model (GVM)}. The GVM is conditioned on policy trajectories and predicts token-level return-to-go estimates. By decoupling value model from policy training (via frozen GVM-driven RL objectives), DVPO eliminates actor-critic interdependence, reducing GPU memory usage by 40\% and training time by 35\% compared to conventional RLHF. Experiments across benchmarks show DVPO outperforms efficient RLHF methods (e.g., DPO) while matching state-of-the-art PPO in performance.

arxiv preprint arxiv, policy optimization, value model, (11 more...)

arXiv.org Artificial Intelligence

2502.16944

Country:

Europe > Greece (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback